OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers

نویسندگان

  • Keiji Kimura
  • Masayoshi Mase
  • Hiroki Mikami
  • Takamichi Miyamoto
  • Jun Shirako
  • Hironori Kasahara
چکیده

OSCAR (Optimally Scheduled Advanced Multiprocessor) API has been designed for real-time embedded low-power multicores to generate parallel programs for various multicores from different vendors by using the OSCAR parallelizing compiler. The OSCAR API has been developed by Waseda University in collaboration with Fujitsu Laboratory, Hitachi, NEC, Panasonic, Renesas Technology, and Toshiba in an METI/NEDO project entitled “Multicore Technology for Realtime Consumer Electronics.” By using the OSCAR API as an interface between the OSCAR compiler and backend compilers, the OSCAR compiler enables hierarchical multigrain parallel processing with memory optimization under capacity restriction for cache memory, local memory, distributed shared memory, and on-chip/off-chip shared memory; data transfer using a DMA controller; and power reduction control using DVFS (Dynamic Voltage and Frequency Scaling), clock gating, and power gating for various embedded multicores. In addition, a parallelized program automatically generated by the OSCAR compiler with OSCAR API can be compiled by the ordinary OpenMP compilers since the OSCAR API is designed on a subset of the OpenMP. This paper describes the OSCAR API and its compatibility with the OSCAR compiler by showing code examples. Performance evaluations of the OSCAR compiler and the OSCAR API are carried out using an IBM Power5+ workstation, an IBM Power6 high-end SMP server, and a newly developed consumer electronics multicore chip RP2 by Renesas, Hitachi and Waseda. From the results of scalability evaluation, it is found that on an average, the OSCAR compiler with the OSCAR API can exploit 5.8 times speedup over the sequential execution on the Power5+ workstation with eight cores and 2.9 times speedup on RP2 with four cores, respectively. In addition, the OSCAR compiler can accelerate an IBM XL Fortran compiler up to 3.3 times on the Power6 SMP server. Due to low-power optimization on RP2, the OSCAR compiler with the OSCAR API achieves a maximum power reduction of 84% in the real-time execution mode.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

OSCAR API v2.1: Extensions for an Advanced Accelerator Control Scheme to a Low-Power Multicore API

The number of cores in smartphones and tablet-PCs are rapidly increasing along with their required high computational power. However, almost all applications on those devices have not used multiple cores for their high speed and low power execution since the application development environments, which allow the application developers easy and prompt development of parallelized application, are ...

متن کامل

Parallelizing Compiler Framework and API for Power Reduction and Software Productivity of Real-Time Heterogeneous Multicores

Heterogeneous multicores have been attracting much attention to attain high performance keeping power consumption low in wide spread of areas. However, heterogeneous multicores force programmers very difficult programming. The long application program development period lowers product competitiveness. In order to overcome such a situation, this paper proposes a compilation framework which bridg...

متن کامل

Evaluation of Automatic Power Reduction with OSCAR Compiler on Intel Haswell and ARM Cortex-A9 Multicores

Reducing power dissipation is one of the most important issues that need to be addressed to improve the performance of all computing systems, such as supercomputers, cloud servers, desktop PCs, medical systems, and wearable devices. Exploiting parallelism and decreasing redundant power dissipation by fine grain power control for multicore/manycore systems are promising approaches, which can ens...

متن کامل

Scalable Lossless High Definition Image Coding on Multicore Platforms

With the advent of multicores in all processor segments including mobile, embedded, desktop and server ones, we are in the new era of multiplying computing power via scaling the number of cores. The multicore approach is more versatile and programmable than the ASIC approach. For instance, the same multicore product can be adapted to the ever-improving potpourri image processing standards. Deve...

متن کامل

Smoothing non-uniform communication latencies for OLTP

Transaction processing applications traditionally run on the high-end servers. Up until recently, such servers had uniform core-to-core communication latencies. Now with multisocket multicores, for the first time we have Islands, i.e., groups of cores that communicate very fast with cores that belong to the same group and several times slower with cores from other groups. In current mainstream ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009